The 2016 election was one of the most polarizing in our Nation’s history. There is a multitude of data available to help us understand the landscape of the 2016 election and why voters made the decision to back Donald Trump. The first section “A Polarized Nation”, highlights demographic and economic variables that distinguish Democratic and Republican counties. While Republican counties tend to have lower income than Democratic ones, they also have lower levels of poverty. White proportion of the population is a strong indicator of county outcome. The next section “News of the 2016 Presidential Election” examines the textual output of liberal leaning The New York Times and the conservative leaning Wall Street Journal (WSJ) during June to November 2016 using sentiment analysis, TF-IDF scoring, and bigram analysis. There are distinct differences between the publications. The third section “2016 Split-Ticket Voting” shows choropleths of all levels of elections that occurred in 2016, as well as the vote difference between Presidential and all other elections. Western Wisconsin supports Democratic House candidates while also leaning Republican in the Presidential race. Local elections in the south that had data were consistently more Republican than those counties’ Presidential votes. And in the final section “County Factors in 2016 Outcome”, association rules mining is used as a correlative method to determine demographics that commonly lead to Democratic or Republican winners in counties. This brings the narrative full circle as many of the demographics considered in the first section are strongly associated with polarized outcomes.
Zip files were downloaded from the following sites. The data files were already in a neat csv format.
Data processing included the following:
Data was downloaded from IPUMS using their interactive data puller. The time period for the data is 2005-2016 as those are the years that provide county FIPS codes. The following variables were used:
Data was aggregated up to the county-level using weighted statistics according to the person weight variable.
ACS data from IPUMS USA, University of Minnesota, www.ipums.org
Data was downloaded from Factiva in 100 article chunks. The search parameters were as follows:
3,013 results were found, and the raw data was downloaded in rtf format and converted to raw text using the striprtf package in Python. This data is then cleaned, tokenized, stemmed, and stop words removed using nltk. Sentiment is calculated using nltk VADER sentiment. TF-IDF analysis is performed using the nltk package. Tensorflow is used to perform bi-directional LSTM neural network analysis to predict news publication based on cleaned tokenized text.
Association rules are a method for showing IF-THEN correlations. This network graphs shows ACS demographic variables and rules that are correlated with U.S. counties picking the Democratic or Republican candidate in the 2016 election. Shading of the rules indicates the degree of dependence among the variables for that rule. Hover your mouse over the rules for statistics about the rule. Hover over any node to see its direct connections.
For background on the metrics, read here